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Abstract 

This paper considers testing a covariance matrix S in the high dimensional setting 
where the dimension p can be comparable or much larger than the sample size n. 
The problem of testing the hypothesis Hq : S = Sq for a given covariance matrix 
So is studied from a minimax point of view. We first characterize the boundary that 
separates the testable region from the non-testable region by the Frobenius norm when 
the ratio between the dimension p over the sample size n is bounded. A test based on a 
fZ-statistic is introduced and is shown to be rate optimal over this asymptotic regime. 
Furthermore, it is shown that the power of this test uniformly dominates that of the 
corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which 
the CLRT is applicable. The power of the [/-statistic based test is also analyzed when 
p/n is unbounded. 
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1 Introduction 



Covariance structure plays a fundamental role in multivariate analysis and testing the co- 
variance matrix is an important problem. Let Xi, . . . , be n independent and identically 
distributed ]5- vectors following a multivariate normal distribution A'p(0,S). A hypothesis 
testing problem of significant interest is testing 

: S = /. (1) 

Note that any null hypothesis Hq : E = Sq with a given positive definite covariance matrix 
, since one can always transform Xi to = Sq X-i and then test 
([T]) based on the transformed data. 

This testing problem has been well studied in the classical setting of small p and large 
n. See, for example, Anderson [l] and Muirhead [13]. In particular, the likelihood ratio test 
(LRT) is commonly used. Driven by a wide range of contemporary scientific applications, 
analysis of high dimensional data is of significant current interest. In the high dimensional 
setting, where the dimension can be comparable to or even much larger than the sample 
size, the conventional testing procedures such as the LRT perform poorly or are not even 
well defined. Several testing procedures designed for the high-dimensional setting have been 
proposed. Let S = ^ SILi -^i-^'i be the sample covariance matrix. The existing tests for 
([T]) in the literature can be categorized as the following according to the asymptotic regime 
under which they are suitable: 

• p fixed and n — )• oo. In this classical asymptotic regime, conventional tests for ([T]) 
include the likelihood ratio test (LRT) [1], Roy's largest root test [16], and Nagao's 
test |14j . In particular, the LRT statistic is LRn = nLn, where 

Ln = trS — logdet(S') — p. 

The asymptotic distribution of LRn under Hq is Xp(p+i)/2- 

• Both n,p — )• oo and p/n — )• c G (0,oo). Investigation in this asymptotic regime has 
been very active in the past decade. For example, Johnstone PT| revisited Roy's 
largest root test and derived the Tracy- Widom limit of its null distribution. Ledoit 
and Wolf |12J proposed a new test based on Nagao's proposal. See also Srivastava 
[T7] . When p grows, the chi-squared limiting null distribution of the LRT statistic 
LRn is no longer valid. Recently, Bai, et al [2] proposed a corrected LRT when c < 1, 
and Jiang, et al jl^ extended it to the case when p < n and c = 1. Here, for = p/n, 
the test statistic of the corrected LRT is 

^ Ln - P[l - (1 - C-^) log(l - C„)] - I log(l - Cn) 

V-21og(l-c„) -2c„ 
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whose asymptotic null distribution is A^(0, 1). Note that no test based on the likeli- 
hood ratio can be defined when p > n or c> 1. 

• Both n,p — 7- oo and p/n — t- oo. This is the ultra high-dimensional setting and both 
the LRT and corrected LRT are not well defined in this case. The testing problem 
in this asymptotic regime is not as well studied as in the previous categories. Birke 
and Dette [3] derived the asymptotic null distribution of the Ledoit-Wolf test under 
the current asymptotic regime. More recently, Chen, et al [7] proposed a new test 
statistic and derived its asymptotic null distribution when both n,p — t- oo, regardless 
of the limiting behavior oi p/n. 

When the dimension p grows together with the sample size n, the focus of most of the 
aforementioned papers is mainly on finding the asymptotic null distribution of the proposed 
test statistic, so the significance level of the test can be controlled. The few exceptions 
include Srivastava [T7] and Chen, et al [7], where the asymptotic pointwise power of the 
proposed tests is also studied. Recently, Onatski, et al [l5] established the regime of mutual 
contiguity of the joint distributions of the sample eigenvalues under the null and under the 
special alternative of rank one perturbation to the identity matrix, and then applied Le 
Cam's third lemma to study the pointwise power of a collection of eigenvalue based tests 
for ([T]) against this special class of alternative. 

In the present paper we investigate this testing problem in the high-dimensional set- 
tings from a minimax point of view. Consider testing ([T]) against a composite alternative 
hypothesis 

i^i : S G e, where G = G„ = {S : ||S - /||^ > e„}. (3) 

Here, ||^||^ = afj)^^'^ denotes the Probenius norm of a matrix A = {uij). It is clear 
that the difficulty of testing between Hq and Hi depends on the value of e„: the smaller 
€n is, the harder it is to distinguish between the two hypotheses. An interesting question 
is: What is the boundary that separates the testable region, where it is possible to reliably 
detect the alternative based on the observations, from the untestable region, where it is 
impossible to do so? This problem is connected to the classical contiguity theory. It is also 
important to construct a test that can optimally distinguish between the two hypotheses 
in the testable region. The high-dimensional settings here include all the cases where the 
dimension p = p„ — )• oo as the sample size n — )• oo, and there is no restriction on the limit 
of p/n unless otherwise stated. 

For a given the significance level < a < 1, our first goal is to identify the separation 
rate e„ at which there exists a test (p based on the random sample {Xi, such that 

inf Ps^fp rejects Hq) > j3 > a. 
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Hence the test is able to detect any alternative that is separated away from the null by a 
certain distance e„ with a guaranteed power /3 > a. Our second goal is to construct such a 
testing procedure 4>. 

The major contribution of the current paper is threefold. First, we show that \ip/n is 
bounded, then the rate e„ needs to be no less than h\fpjn for some constant h. In addition, 
it is shown that if = 6y^p/n, there exists a test "0 of significance level a, such that 
lim„_>oo infe PeIV' rejects i?o) > and the power tends to 1 if 5 = 5„ — )• oo. The test is 
motivated by the proposal in Chen, et al [7]. [We use ■0 to denote the specific test that we 
construct, while (/> is used to denote a generic test.] Here, we no longer require p/n to be 
bounded, and the explicit expression for the asymptotic power of i\) is also given. Moreover, 
we show that the asymptotic power of '\\} on 0„ uniformly dominates that of the corrected 
LRT by Bai, et al [2j and Jiang, et al pil] over the entire asymptotic regime under which 
the corrected LRT is defined, i.e., p < n and p/n ^ c £ (0, 1]. 

The rest of the paper is organized as the following. In Section [2j after introducing basic 
notation and definitions, we establish a lower bound of the separation rate en- Section [3] 
introduces the test based on a [/-statistic and provides a Berry-Essen bound for its weak 
convergence to the normal limit under both the null and the alternative hypotheses, which 
leads to the establishment of its guaranteed power over G when Cn = h^Jpjn. Furthermore, 
we also show that the power of this test uniformly dominates that of the corrected LRT. 
The theoretical results are supported by the numerical experiments in Section |4j Further 
discussions on the connections of our results and those of related testing problems are given 
in Section [5} The main results are proved in Section [6} 

2 Lower bound 

In this section, we establish a lower bound for the separation rate en in Q. The result in 
Section |3] will show that this lower bound is rate-optimal. The lower and upper bounds 
together characterize the separation boundary between the testable and non-testable regions 
when the ratio of the dimension p over the sample size n is bounded. This separation 
boundary can then be used as a minimax benchmark for the evaluation of the performance 
of a test in this asymptotic regime. 

We begin with basic notation and definitions. Throughout the paper, a test 4> = 
. . . refers to a measurable function which maps Xi, . . . ,X„ to the closed in- 
terval [0,1], where the value stands for the probability of rejecting i^o- So, the signifi- 
cance level of ^ is P/(0 rejects Kq) = E/(/>, and its power at a certain alternative S is 
Ps(</' rejects Hq) = Es0. Here and after, Ps, E^, Var^ and Covs denote the induced prob- 
ability measure, expectation, variance and covariance when Xi, . . . , X„ ~ iVp(0, S). The 
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subscript is shown only when clarity dictates. 

To state the lower bound result, let e„ = h^jpjn for some constant 6, and define 

9(6) = {S : lis - /||^ > 6x/p/n}. (4) 

Theorem 1 (Lower bound). Let < a < /3 < 1. Suppose that as n ^ oo, p — )• oo 
and that p/n < k for some constant k, < oo and all n. Then there exists a constant 
b = h{n, 13 — a) < 1, such that for any test 4> with significance level a for testing Hq : T, = I, 

limsup inf Esc/* < /3. 

n^oo see(b) 

Theorem [l] shows that no level a test for ([T]) can distinguish between the two hypotheses 
with power tending to 1 as n and p grow, when the separation rate e„ is of order y^p/n. 
Hence, it provides a lower bound for the separation rate. 

We now give an outline of the proof for Theorem [T| while the complete proof is provided 
in Section 6.1 Consider the following "least favorable" subset of Q(b): 



e*(6) = <^ 



Ipxp + 



vv' : V e {±1}'' 



(5) 



lid 



With slight abuse of notation, let Pq be the probability measure when Xi, . . . Xn ~ -^^(0, /) 



and Py the probability measure when Xi,...,Xn *~ Np{0,Tjv). In addition, let Pi = 
^ J2v&{±i}p -^^ be the average measure of the P^'s. Then for any test (j), the sum of 
probabilities of its two types of errors satisfies 



sup Eo0 + E^(l - 0) > inf sup EqV^ + Et,(l - ^p) 

V 

= inf EoV' + Ei(l-V) 



1 



1, 



lA-^ol 



Here, Eq, E^ and Ei denote the expectation under Pq, P^, and Pi respectively, and ||Pi — Po||i 
is the Li distance between Pq and Pi. Thus, we obtain 



inf Es0<infE^0< Eo0+ J||Pi-Po||i 

SgG(6) V 2 



a + \\\P^ 



olli- 



To control the rightmost side, we bound the Li distance by the chi-square divergence as 



|Pi-Po||?<Eo 



dPi 
dP) 



dPi 
dPo 



fl 
fo 



where fi is the density function of Pj for i = 0,1. So, the proof can be completed by 
showing that for an appropriate choice of the constant b, one obtains / — 1 < 4(/3 — a)^. 
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Remark 1. (a). Note that all the covariance matrices in the least favorable configuration 
@*{b) defined in ([s]) have diagonal elements all equal to 1. Thus, they are also correlation 
matrices. So the proof of Theorem [T] readily establishes an analogous lower bound result 
on testing Hq : R = I with R the population correlation matrix. 

(b). The lower bound argument here does not extend to the case whenp/n is unbounded, 
because the chi-square divergence becomes unbounded. 



3 Upper bound 

In this section, we show that there exists a level a test whose power over Qn is uniformly 
larger than a prescribed value /3 > a, if e„ = by^p/n for a large enough constant b. This 
matches the lower bound result in Theorem [T] when p/n is bounded. In addition, the results 
in the current section remain valid even when p/n is unbounded. 

We first introduce the test statistic in Section |3.1[ followed by a study on the rate of 
convergence of its distribution to the normal limit under both the null and the alternative 



hypotheses in Section 3.2, Section 3.3 then uses the rate of convergence result to study the 



asymptotic power of the proposed test. Finally, Section [3^ shows that the test dominates 
the corrected LRT in Q when p/n ^ c & (0) !]• 

3.1 Test statistic 

Given a random sample Xi, . . . ,X„ *~ Np{0, S), a natural approach to test between ([T]) 
and ([3]) is to first estimate the squared Frobenius norm ||S — /||^ = tr(S — J)'^ by some 
statistic T„ = T„(Xi, . . . and then reject the null hypothesis if T„ is too large. To 

estimate ||S — /||^ = tr($] — J)^, note that £2/1(^1, X2) = tr(S — I)^ where 

hiXi,X2) = iX[X2f - {X[Xi + X'2X2) + p. (6) 

Therefore, tr(S — 7)^ can be estimated by the following [/-statistic 



2 



" n(n — 1) 

for which we have 



- J2 ^(^-^^O' (7) 



1< j<j<n 



^„(S) = EE(r„)=tr(S-I)^ (8) 

ali^) = Vars(T„) = [tv\E') + tr(S^)] + ^tr(s2(S - if) . (9) 

n[n — Ij ^ n 



Here, verifying ([s]) is straightforward, and ^ is proved in Appendix A. 2 For the [/-statistic 



Tn, the proof for Theorem 2 of Chen, et al |7] essentially established the following. 
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Proposition 2 (Theorem 2 of [7]). Suppose that p—)-oo asn— )-oo. If a sequence of 
covariance matrices satisfy tr(S^) — )• oo and tr(S'^)/tr^(S^) — t- as n ^ oo, then under 
Ps, we have 

^""f;f^ ^iv(o,i). 

0"n(^j 

Note that as p — )• oo, the identity matrix Ipxp satisfies the condition of the above 
proposition. Also note that )U„(/) = and (T^(/) = ^n{n-ij ■ Thus, Proposition j2j quantifies 
the behavior of T„ under Hq, and we could define the test as the following: For any 
a G (0, 1), an asymptotic level a test based on T„ is given by 



ip = I lTn> zi^a ■ 2< 



'p{p+l) 



n{n 



(10) 



Here, /(•) is the indicator function, and zi-a denotes the 100 x (1 — a)th percentile of the 
standard normal distribution. This test is motivated by the test introduced in Chen, et al 
[7], while the original proposal in [7J involves higher order symmetric functions of the XiS. 

In addition to specifying the rejection region in ([To]), Proposition [2] can also be used 
to study the asymptotic power of ijj over a sequence of simple alternatives. However, to 
understand the power of -0 over the composite alternative G in ([s]), it is necessary to 
understand the rate of convergence of [T„ — /i„(S)]/c7„(S) to the normal limit, which is the 
central topic of the next subsection. 



3.2 Rate of convergence 

We now study the rate of convergence for the distribution of [T„ — ;U„(i;)]/(Tn(S) to its 
normal limit in Kolmogorov distance. Let ^>(-) be the cumulative distribution function of 
the standard normal distribution. We have the following Berry-Essen type bound. 

Proposition 3. Under the condition of Proposition^ there exists a numeric constant C 
such that 

Ps ( ^^y-' < X ) _cl>(a;) ' ■ 



sup 



0-r 



(S) 



< c 



1 I tr(S^) 
n tr2(S2) 



We outline the proof of Proposition [3] below, while the complete proof is deferred to 



Section 6.2 The primary tool used in the proof is a Berry-Esseen type bound for martingale 
central limit theorem by Heyde and Brown [8]. 

We begin by giving a martingale representation of T„ ''"^ ^ ' 
Define filtration 



/i„(S). Let Xi ~ iVp(0,S). 



— <^{Xl, ■ ■ ■ ,Xk 



1, . . . ,n. 
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Also introducing the notation Efc[-] = Es[- \ J^k]- Then, 

n n 
Tn — = ^ Efc[T„] - E/c_i[T„] = ^ Dnk- 



(11) 



k=l 



k=l 



Here, {Dnk : A; = 1, . . . , n} is a martingale difference sequence. The explicit expression for 
Dnk is 



D 



nk 



n{n — 1) 



X'f.Qk-\Xk — tr(Qfc_iS) + — X'fJ^Xk — tr(S^ 



n 



X'^^Xk - tr(S) 



(12) 



with = THzliXrXi - E). Let a^, = ^k-i[DU, and we have ^^(S) = ELi Esia^J. 
Under the current setup, the main theorem in [8] specializes to the following lemma. 

Lemma 1. There exist a numeric constant C , such that 
'Tn - fln{^) 



sup 



< X ] - ^(x] 



< 



C 



^ n n 

— ( Es [Di,] + E^[Y^al,- al (S)] " 



k=l 



k=l 



1/5 



(13) 



Define 



i?i = 5^Es[Z)^,] and E2 = a^, - al{J:)]- 



(14) 



k=l 



k=l 



The proof of Proposition [s] could then be completed by showing that Si/(T^(S) = 0(l/n) 
and E2/a^i^) = 0(tr(S^)/tr2(s2)). See Section O for details. 



3.3 Power of the test 



Equipped with Proposition |3| we now investigate the power of the test ^ in ( 10 ) over the 
composite alternative //i : S G 0(6), with b < 1, where 0(6) is defined in (|4]). In particular, 
we have the following result. 

Theorem 4 (Upper bound). Suppose that p — )• 00 as n —)• 00. For any significance level 
a G (0, 1) and 0(6) in (Q, the power of the test -il) in (10) satisfies 



lim inf EsV' = 1 — ^ [ z\-a | > a. 

ri-s>oo 0(6) V 2 



Moreover, for 6„ 00, lim„_^oo inf0(b„) EsV' = 1- 
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Theorem |4] shows that the test •i/' can distinguish between the nuh ([T]) and the alternative 
^ with power tending to 1 when b = bn ^ oo. Comparing with the lower bound given 
in Theorem [l| the test is rate-optimal when p/n is bounded. When ||S — /||^ x y^p/n, 
the proof of Theorem |4] essentially shows that the power of "0 is also monotone increasing 
in ||E — I\\p- 

To prove Theorem [4j we first notice that the second claim is a direct consequence of the 
first one. Indeed, if the first claim is true, then for any fixed constant 6 > 0, 

/ b^ 

liminf inf E-^ip > lim inf E•£^jJ = 1 — $ ^i-o 

n^oo e(b„) n-^oo 0(6) ^ 2 

Because the above inequality holds for any b, we obtain liminf„_!.oo iiif0(b„) EsV' ^ 1- On 
the other hand, ^ < 1 and so limsup„_^oo infe(fc^') E^.tp < 1. This leads to the second claim. 

Turn to the proof of the first claim, we divide 0(6) into two disjoint subsets Q(b) = 
@{b,B)uG{B), where 

e{b,B) = |S : b^/^ < ||S - /||^ < SyW^j, 
6(5) = |S : ||S - /||^ > By^j. 

Here, i? is a sufficiently large constant, the choice of which depends only on a and b, but not 
on n or p. We employ different proof strategies on the two subsets. On @{B), Chebyshev's 
inequality readily shows that 

62' 



inf Ee0 > 1 - -S- 

0(B) V 2 

Turn to Q{b,B). On this subset. Proposition [s] then plays the key role in obtaining a 

uniform approximation to the power function Ej]ip by the normal distribution function 

IIS— /|p 

^{zi^a — 2p/n )i which in turn leads to the final claim. For a detailed proof, see Section 



Remark 2. (a). When p/n is bounded, the conclusion of the theorem matches the lower 
bound in Theorem [l| However, the result here holds even when p/n is unbounded, 
(b). It can be seen from the proof of Theorem |4] that the simple expression 



gives good approximation to the power of the test tp defined in ( 10 ) at any S of interest 
in practice, because the approximation works well until the power of the test is extremely 
close to a or 1. 
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3.4 Power comparison with the corrected LRT 

In the classical asymptotic regime where p is fixed and n — t- oo, the likelihood ratio test 
(LRT) is one of the most commonly used tests. In the high-dimensional setting where both 
n and p are large and p < n, Bai, et al p] showed that the LRT is not well behaved as the 
chi-squared limiting distribution under Hq no longer holds. 

For testing ([T]), when p < n and p/n ^ c ^ (0) l)i Bai, et al [2] proposed a corrected 
likelihood ratio test (CLRT) with the test statistic CLRn given in ([2]). It was shown that 
the test statistic CLRn =^ N(0, 1) under Hq and this leads to an asymptotically level a test 
by rejecting Hq when CLRn > z\-a. It was shown that the CLRT significantly outperforms 
the LRT when both n and p are large and p < n. Recently, Jiang, et al [10] also considered 
the CLRT and showed that the above limit holds even when p/n ^ 1. 

It is interesting to compare the power of the CLRT with that of the test defined in 



(10). Note that the test given in (10) is always well defined, but the CLRT is only properly 



defined in the asymptotic regime where p < n and p/n — )• c G (0, 1]. The following result 



shows that the power of the CLRT is uniformly dominated by that of t/j given in (10) over 
the entire asymptotic regime under which the CLRT is applicable. 

Proposition 5. Suppose that as n ^ oo, p — >■ oo with p < n and p/n — t- c G (0, 1]. Let 



'^i''') — • 11^ ~ -^IIf ~ '^\iVlri\- Then for ijj in (10) and the corrected LRT (pc'LR, we 
have 

lim inf E^V' > limsup inf Ey:4>clr, for all r G (0, 1). 

n^ooC(r) n^-oo C(t) 

Moreover, for Q{b) in ^ with b G (0, 1), 

lim inf E^V' > limsup inf Es(/>cLi?- 

n-5.00 e(6) n-s>oo e{b) 

Hence, the CLRT is sub-optimal whenever it is properly defined. A proof of Proposition 
[5] is given in Section 6.4 



4 Numerical experiments 

In this section a small simulation study is carried out to compare the power of the test ip 



defined in (10) with that of the CLRT under two specific alternatives. 



The first alternative is the equi-correlation matrix S = (cxjj), where for p G (0, 1), 

{1, i = j, 
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n = 80 , p = 40 



n = 200 , p = 40 




Figure 1: Power curves of the ^p test (blue, solid) and the CLRT (red, dashed) under the 
equi-correlation alternative. Each dot is obtained from 5000 repetitions, and the curves are 
then obtained via linear interpolation. 

Figure [T] shows how the power functions of the tp test and the CLRT grow with ||S — I\\p 
when p = 40 and n = 80 or 200. For both tests, the significance levels are fixed at 
a = 0.05. To make a fair comparison, the 95th percentiles of the null distributions of both 
test statistics are obtained via simulation instead of using those of the asymptotic normal 
distributions. From Figure [T| it is clear that the ip test is more powerful than the CLRT 
for both {n,p) configurations. The difference between the powers is smaller when n/p is 
larger. This is not surprising, because the LRT is a powerful test in the "large n, small p" 
regime. 

The second alternative is the tridiagonal matrix S = (cTij), where for p G (0, 1), 

1, i = j, 
(^ij = \P, N - Jl = 1, 
0, \i-j\>l. 

Figure [2] shows how the power functions of the test and the CLRT grow with ||S — /Hj? 
for the tridiagonal alternative. All the other setups remain unchanged. Here, the power of 
the ip test still dominates, while the difference in power between the two tests is smaller. 
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n = 80 , p = 40 



n = 200 , p = 40 




Figure 2: Power curves of the ip test (blue, solid) and the CLRT (red, dashed) against 
the tridiagonal alternative. Each dot is obtained from 5000 repetitions, and the curves are 
then obtained via linear interpolation. 

5 Discussion 

We have focused in the present paper on testing the hypotheses under the Probenius norm. 
The technical arguments developed in this paper can also be used for testing under other 
matrix norms. Consider, for example, testing ([T]) against the following composite alternative 
hypothesis 

i/iiSGG, where 6 = e„ = {US - /||^ > e„}. 
Here is the spectral norm defined by = max||2,||2=i ||Aa;||2. Define 

e,(6) = {S: >6Vp7^}. (16) 

Then the same lower bound holds for @sib)- To be more precise, we have the following 
result. 

Theorem 6. Let < a < f3 < 1. Suppose that as n ^ oo, p — >• cxd and p/n < k for some 
constant k < oo and all n. Then there exist a constant b = b{n, f3 — a) < 1, such that for 
any test (j) with significance level < a < 1 for testing Hq : S = /, 

limsup inf E^cj) < 13. 
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The proof of Theorem [6] is analagous to that of Theorem [TJ We beheve that the rate 
of \fpjn in the lower bound is sharp. It is however unclear which test is optimal against 
the alternative (16) under the spectral norm. Obtaining a matching upper bound for a 
practically useful test is an interesting project for future research. 

The results in the current paper also shed light on the problem of testing for indepen- 
dence, i.e., i^o '■ R = I, where R is the population correlation matrix. Following Remark 
[T| the proof of Theorems [T] and [6] can be used directly to establish the same lower bound 
results on testing the correlation matrix. 

Onatski, et al [15J also studied the hypothesis testing problem ([T]), but their attention 
is restricted to testing against alternatives that are rank one perturbations to the identity 
matrix. That is, under the alternative Hi the covariance matrix belongs to the set = 
{I + hvv' : ||f II2 = 1}- The asymptotic regime is restricted to p/n — t- c G (0,oo). In this 
asymptotic regime. Theorem 7 in [15] gives a lower bound result analogous to Theorem 
[1] However, it does not cover the case when p/n — )• 0, nor can it be extended to the 
case of testing correlation matrices. In addition, we notice that though the result in [15] 
enables one to study the asymptotic power of all the eigenvalue-based tests on each 
when p/n — )• c G (0, 00), it does not give a minimax claim as we did in Theorem |4j 

The results in this paper also raised a number of interesting questions for future research. 
One example is the testing of equality of two covariance matrices based on the independent 
random samples Xi, . . . , X„j *~ Np{ni, Si) and Yi, . . . , *~ ^p{l^2: ^2)- The validity of 
many commonly used statistical procedures including the classical Fisher's linear discrim- 
inant analysis requires the assumption of equal covariance matrices. So it is of interest to 
test Hq : Si = S2. Motivated by an unbiased estimator of the Frobenius norm of Si — S2, 
Chen and Li [6] proposed a test using a linear combination of [/-statistics and studied its 
power. Cai, et al introduced a test based on the maximum of the standardized differ- 
ences between the entries of the two sample covariance matrices. The test is shown to be 
powerful against sparse alternatives and robust with respect to the population distribu- 
tions. However, the optimality of the two-sample tests has not been well studied. This is 
an important topic for future research that is of both theoretical and practical interest. 

In the present paper, no structural assumption is imposed on the alternative class of the 
covariance matrices such as sparsity or handedness. An optimal test against a structured 



alternative is potentially very different from the test ( 10 ) considered here. Recently, Cai and 
Jiang [3] considered testing the null hypothesis that S is a banded matrix and introduced 
a test based on the coherence of a random matrix. Xiao and Wu [18] proposed a test for 
testing Hq : T, = I against sparse alternatives. Their test is based on the maximum of 
the standardized entries of the sample covariance matrix. The limiting null distribution 
is shown to be a type I extreme value distribution, the power of the test is not analyzed. 
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It is interesting to investigate the optimality of these testing problems with structured 
alternatives. 



6 Proofs 

In this section, we prove Theorems [T| |4] and Propositions [3] and [5| 
6.1 Proof of Theorem [T] 

Recall that Pq is the probability measure when Xi, . . . Xn *~ Np{0, 1) and P„ is the prob- 
ability measure when Xi,...,Xn *~ A'p(0,S^). In addition, Pi = ^ J2v&{±i}p 
average measure of the P^'s. Let /o and /i be the density functions of Pq and Pi respec- 
tively. By the discussion following Theorem [T| we could prove Theorem [T] by showing that 
//iV/o-l<4(/3-a)2. 



After some basic calculation (see Appendix A.l for details), we obtain that if 6 < 6o(k) 
such that 



6 < 1 and 



bp 



< 



1 



then 



ff _ (1 - a2)"-"P/2 
h~ [l + (p-l)a2]" 



1 



pa \2/lV\2' 
1 + {p-l)a^) Vl^ 



-n/2 



(17) 



(18) 



Here, the expectation is taken w.r.t. V = (Vi, . . . , Vp)' where the Vj^s are i.i.d. Rademacher 
random variables which take values ±1 with equal probability. 



Note that (17) and p/n < k implies 



pa 



2 1 
l + (p-l)a2y -2' 



Also note that (I'V/p)"^ G [0, 1]. Thus, let bnp = ij^^^zTja^)^ , and (I'V/p)^ = Cp, we have 



E{1 - bnp^p)-^/^ = E[{l-bnpQ-^ 

< Eexp ( nbnpCp 



, log4 62p2 
<Eexp(— — 



(0 < (1 - xfl"" < 4, for all x £ [0, 1/2]) 
(6np<pV/Kp-l)]) 



For ^p, Hoeffding's inequality [9j, applied to Rademacher variables, yields 



P(^P > A) < 2e-2p^, for all A >0. 
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Thus, we obtain 



E exp 



< 1 + / 2 exp j — du 

Here, the last equahty holds if 2 (p — 1) > b^p log 2, which is always true for large p since 
b<l. 



In addition, with b satisfying (17), when n — t- oo, 
(l-o2)'^-"P/2^e^', 



1 + {p - l)aY ^ 



Therefore, for large enough n > uq^k), 



< 



862plog2 



2(p- 1) -62plog2' 



which, for sufficiently small b < b^^K, P — a), is no larger than 4(/3 — a)^. This completes 
the proof. 

6.2 Proof of Proposition |3] 

Following the outline of proof after Propositionjsj for Ei and E2 defined in ( |14| ), we complete 
the proof below by showing that = 0{l/n) and ^2/tT^(S) = 0(tr(S4)/tr2(S2)). 

To this end, we start with some preliminaries. Throughout the proof, E and Var are 
used as abbreviations for E^ and Vars, respectively. Recall the martingale representation 



(11), where each martingale difference Dnk has the explicit expression (12). For Dnk, its 



conditional variance is 



'n?{n — 1)2 
16 

n^in — 1 



tr(Q,,_iSQfe_iS) + 



16 



Ti?{n — 1) 



MQk-iT?) + ^tr(s2(S - If 



(19) 



Detailed derivation of (12) and (19) can be found in Appendix A. 3 With (19), it is not 
difficult to verify that 



2 1 ^ 8(fe-l) 



n2(n — 1)^ 



tr2(s2)+tr(S4)l +4tr(s2(S-/)2), 
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and that cr,^ = Var(T'„) = X]fe=i ^[^nfc]- ^^^^ but not least, we have for any j > k, 

Efc-iknj - Ect^j] = CTnfc - Ecr.^fc- (20) 
Now, we turn to the studies of Ei and E2. 

Term Ei We begin with the first term. Decompose the covariance matrix (as in [7J) as 
S = rr', with r G M^^p. Then, we have the representation 

Xi = TZi, Zi~iVp(0,/), i = l,...,n. (21) 

We further define 

fc-i fe-i 

A = rr, Mfe_i = r' ^{x,x', -j:)t = a Y}~z,z[ - i)a. 



With the above definition, (12) can be rewritten as 



E)nk = -r^—TT [Z'^Mk-iZk - tr(Mfe_i)] + - [Z'^{A^ - A)Zk - ti{A^ - A)] . 
n[n — Ij ^ ^ 

Therefore, we obtain from the Cauchy-Schwarz inequality and Lemma [3] that 

E[Z?^,] < ^,^[Z',{A^ - A)Zk - tr(^2 . A)]' + f_ E[Z',Mk.,Zk - tr(Mfe_i)]' 



For tr(M|_^), we use the following lemma, the proof of which is given in Appendix A. 3 
Lemma 2. Fortr(M|_^), we have 

E[tr(M|_i)] =(fc-l)[tr2(s2)+tr(E^)], 
Var[tr(M|_i)] = {k - 1) [24tr(S«) + 16tr(E6)tr(s2) + 8tr2(S^) + 8tr(S^)tr2(s2)] 
+ 2{k -l){k- 2) [6tr(E8) + 2tr2(S^)] . 

For any sequences {a„} and {bn} of positive numbers, write a„ < 6„ if limsup„_i.oQ CLn/bn < 
oo. Note that tr(S6) < tr(S4)tr(E2) and tr(S8) < tr2(S^). Since tr(S^) = o(tr2(s2)), 
Lemma [2] implies that 



E[^^.]<^tr^(S^(S-/)^) + ^tr^(S^), 



and hence 



Er < -^{eH^ - If) + -,tA^')- (22) 
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Term E2 For E2, we can simplify it as 



n 
k=l 

n n~l n 



k=l k=l l=k+l 

n n—1 

5^Var(a2,) + 2j^(n-A:)Var(cT2,) 

fe=l k=l 
n 

5^(2n-2/c + l)Var(cT2,). 



fc=i 



Here, the second equality comes from (20). 

Note that tr(Qfc_iSQfc_iS) = tr(M2_^) and ii{Qk^i{T? - Y?)) = ix[Mk-i{A^ - A)). 
So, by (19), there exist numeric constants C and C", such that 

Var(aL) < ^4(/_ ^)4 Var[tr(M|_i)] + Var[tr(Mfe_i(A^ - ^))] . 

We have studied Var[tr(M|_^)] in Lemma [2J On the other hand, we have from Lemma [s] 
that 

Var[tr(Mfe_i(^2 _ = _ l)Var[tr(AZZ'A(A2 - A))] = {k - l)Var[Z'(yl^ - A^)Z] 

= {k- 1){E[(Z'(A4 - A^)ZY] - [EiZ'iA" - A^)Z]Y} 
= {k-l) [2tr((^4 - A^f) + tr2(^4 - ^3) _ tr2(^4 _ ^3^j 
= 2(A;-l)tr(S^(S-/)2) 
<2(A;-l)tr(S4)tr(s2(S-/)2). 

Since tr(E6) < tr(S'^)tr(S2), tr(S8) < tr2(S'^) and tr(S'') = o(tr2(S2)), we obtain that 
Var(a^,) < Atr(S^)tr2(s2) + ^tr^(S^) + Atr(S^)tr(s2(S - if), 



n° n 
which leads to the bound 

E2 < ^tr(S4)tr(s2(S - If) + ^tr2(s4) + -^tri^Vi^')- (23) 



Summing up By we have 

at X ^tr2(s2(S - If) + i.tr2(s2)tr(s2(S - if) + ^tr^J^^). (24) 

Here and after, for any sequences {a„} and of positive numbers, we write a„ x 6„ if 
o-n/bn is bounded away from both and 00. Thus, we obtain that 



a-^E, = 0{n-'), a-^E2 = 0(tr(S4)/tr2(s2)) . 
Plugging these estimates in Lemma [T| we complete the proof. 
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6.3 Proof of Theorem |4] 



Following the discussion after Theorem |4j we give below the detailed proof of the first 
claim in the theorem. In particular, we bound the power of the test separately on Q{B) 



and @{b,B), which are defined in (15) 



Case 1: Q{B). Here, we shall proceed heavy-handedly by using Chebyshev's inequality, 
because the alternative class is sufficiently far away from Ho. 

For any S G 0(5), there exists t > B, s.t. ||S — I\\p = T^-pjn. Suppose B is large 
enough s.t. t'^ > B'^ > Szi-q. Note that cr„(/) = (2p/n)(l + o(l)), and so 

EsT„ = ||S - ifp = —an{I){l + 0(1)) > Zi-a CTn{I). 

Thus, we can use Chebyshev's inequality to bound the type II error of at E as the 
following: 

1 - EsV' = Pj^{Tn < Zl-aCTn{I)) = Ps(Tn " EsT„ < (7„(/) - E^T^) 

< PE(|r„ - EsT^I > \zi-a(Tn{I) " EsT^I) 

Vars(T„) 



< 



(25) 



[zi-aCrn{I) - EsTnl^' 

For Vars(r„) = cj^($]), we have its explicit expression given in ([o]). Let Amax(5^) denote the 
largest eigenvalue of S. When ||S — /||^ = ry^p/n, we have Aniax(S) < 1 + T^-pjn^ and so 



tr(s2) < 1 + 



2 / -T V 



tr(S^(S - lY) < A4,,(S) lis - /||^ < ^ 1 + r 



n 



Since tr(E^) < tr'^iT,'^) and ct^(I) = (4p2/n2)(l + o(l)), the above inequalities, together 
with lead to 



< 



2(1 + — ) +^ (l + r./^ 



:i+o(i)). 



Since — zi—a ^ , there exists some constant Ca depending only on a, such that 
Vars(r„, ^ 



2(1 + r/^y + (2rVp)(l + ry^f , 
[zi_„a„(/) - Esr„]2 - (rV2 - z^^^f ^ ^ 



1 1 \" 1 1 
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Note that all the o(l) terms in the above derivation are uniform over Q{B). Therefore, 
given a and b, there exist a constant B = B{a, b), such that 

lim inf inf Es'0 > 1 — 7 , "j. ^ .r, 



>1-Ca 
> 1 - $ ( 



1 1 

+ 



B Vn 
2 



1 1 

i?2p n 



> a. 



(26) 



Case 2: Q{b,B). On this subset, we use Proposition [s] to obtain the following uniform 
approximation to the power function by the normal distribution function 



sup 

e(fe,B) 



2p/n 



0. 



(27) 



If (27) is true, then we obtain 



lim inf E^V' = li™ inf $ ^1-0 

n-)-ooe(6,B) n^ooe(b,B) \ 



|S -/|| 
2p/n 



1 - $ zi_ 



> a. 



Together with (26), this leads to the desired claim. 



Turn to the proof of ( |27| ). First, note that uniformly on @{b, B), we have 

p{l - B/V^f < tr(S2) < p{l + B/^/^)^, 

andtr(S^) < A^ax(S) tr(S2) < B'^{p/n)p{l + B / . Therefore, as n ^ 00, 

sup |tr(S2)/p - 1| ^ 0, sup tr(S'^)/tr2(s2) ^ 0. 
e(f),B) e(6,B) 

So the condition of Proposition [3] is satisfied. Next, we observe that 



(28) 



(29) 



IS -III 



o-„(S) ~ o-„(S) 



Zl-a 



o"„(i;) 

'7n(S) 



> 



IS - I\ 



(Tn(S) cr„(S) 



Thus, Proposition |3| and (|29|) together imply that 



sup 

0{b,B) 



IS-/I 
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To complete the proof, what is left to be verified is that 



sup 

e(b,B) 



2p/n 



0, 



because it implies sup0(^^5) 
together with the last display before (|30l), leads to (|27l). To show (|30|), first recall the 



O'n(S) 



(30) 



0, which 



expression of cr^(S) in By ([29]), we obtain that the first term in ^ is 4(p^/n^)(l + o(l)) 
where o(l) is uniform on 0(6, B). On the other hand, 



tr(s2(S - < aLx(S) lis - /||^ < f 1 + ^ < C{B) max (l, 

V \ n J n \ 



P\ _ P 

nJ n 



Here, C{B) is a constant depending only on B. Therefore, we have that the second term 
in ([9]) is of order o{p'^ /n^) uniformly over B(6, B). Putting the two parts together leads to 



( 30 ) . This completes the proof. 



6.4 Proof of Proposition [5] 

Fix any r S (0, 1). At each dimension consider a single point in C(r): 



S — Ipxp + T 



uu 



where u is an arbitrarily fixed unit vector in M^. Since r < 1, Proposition 10 in |15| leads 
to 



lim EY.*(j)cLR = 1 - $ {zi-a - Ht, c)) , for h{T, c) 



T^/c - log(l + T^/c) 

V-2iog(i-cpl;' 



Note that for all r > and c e (0, 1), > h{T, c) > 0. Therefore, 



lim inf Ey.iP = 1 - $ zi-a 

n-5>ooC(r) V 2 



> 1 -<^{zi-a - h{T,c)) 

= lim E^*(pcLR 

n— >oo 

> lim sup inf Es(?!)cLR- 

n— !>oo C(r) 



The proof of the second claim is obtained by replacing C(r) with 0(5) and r with b in the 
above arguments. 
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A Technical details 

A.l Proof details for Theorem [T] 

Here we give the calculation leading to ( |18| ) in the proof of Theorem [T] 

Consider Q*{b) in ([s]). For any v G {±1}^, we have — /||^ = h^J-pjn^ diag(St 
(1, . . . , 1), and for a = h/ \Jn(j) — 1), 



1 - a 



det = (1 - af-\\ + (p - l)a] 



1 



1 



1 - a 1 + (p - l)a 



(31) 



Therefore, we have 



1 1 " 

/o , . . . , x„) = ^^-^ exp{ - 2 ^ } , 

1 1 1 " 

= (27r)"P/2(detS,)"/2 exp{-- g rr^S-irr,} 



(27r)"P/' 



exp 



1 " 
{"2(1 -a) ^^^^4 



2(1 - «) ^ ' i (1 - a)"{P-i)/2[l + (p - l)a]"/2 



And so 



2P ^^^^{20 -1 -a 1 + (p- 1)0] ^^^^^'^ }■ 



1 1 1 + a 

)^^"p{-2(rr^)g^'^^4 



/o (1 - a)"(p-i) [1 + a(p - 1)] " (2^)"P/2 

X i{E«-p[^(r^ - 1 + J_ 1) J EK^^)'] 

11 1 " " ^ 



"7^ 

Now we compute the integral. Fix any f , we have 



/ (2^^"K-Kr^)"4^"KKr^"rTF^ 



dx 



(27r)P/2 

1 + a\ -np/2 



1 — o 

1 + a\ -np/2 



.1 - a 



[Eexp(ty)]'' 

(l-2t)-"/2^ (for t< 1/2) 
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where Y ~ > ^-^d 



1 



1 



1 + a\-i 



A— a 1 + {p — l)aJ \1 — a 
In addition, fix any v ^ u, we have 



(32) 



.2pVl-a l + (p-l)a 
Let X ~ iVp(0, /), Zi = v'X/^, and Z2 = u'X/^/p. Then 



"Zl" 









1 v'u/p 


) 









v'u/p 1 





and so + Z| = (1 + v'u/p)Yi + (1 — v'u/p)Y2, with 1^ *~ Therefore, the second last 



display equals 

1 + a\ -"p/2 



I- a 



where 



Eexp(tiyi + t2>2) 



" /l + 0\-"P/2 



1 — a 



'(l-2ti)-"/2(l-2i2)""/^ 



t2 = ^(l-Vn). 



Collecting terms, we obtain after some linear algebra that 



fl _ (1 - a2)"-"P/2 

To ~ [l + {p-l)a^Y 
(1 - a2)"-'^P/2 
- [l + (p-l)a2]" 



1 



pa 



2 /y'c/N 2 



-n/2 



1 



. 1 + (p — l)a? J \ p 

pa \2/l'y\2l-"/2 
.1 + {p-l)a?) V^J 

Here, both V and U have i.i.d. Rademacher entries which take values ±1 with equal prob- 
ability, and in the first expectation, V and U are independent. 



A. 2 Variance of T„ 

In this part, we establish the variance of given in ([9]). We begin with a technical lemma, 
which is closely connected to [TJ Proposition A.l]. 

Lemma 3. Let Zi, Z2 Np{0,I), and M, N be two p x p psd matrices, then 

E[{Z[MZi){Z[NZi)] = tr(M)tr(7V) + 2tr(MiV); (33) 
E[(Z(MZ2)^] = 3tr2(M2) + 6tr(M^); (34) 
E[(Z[MZi - tr(M))^] = 48tr(M'^) + 12tr2(M2). (35) 
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Proof. Denote the ordered eigenvalues of M by Ai > • • • > Xp, and those of N hy fii > 
• • • > ^p- Let Uj *~ A^(0, 1), j = 1, . . . ,p. For (33), we have 

p p 
E[{Z[MZ^){Z[NZ,)] = e[( J;A,C/|)( 



p 



3 ^ Xjfij + Aj/^, = tr(M)tr(iV) + 2tr(MA^). 



For (34), we define Vj ~' N{0, 1), j = 1, . . . ,p, which are independent from the Uj^s. 
Then 



:[(z;MZ2)^] = E[(X]A,[/,y, 



Y^X^E[Ufnv^'] + (^) 5;A,^AfE[[/J]E[C/f]E[V^2]E[^^] 



: 9 Aj + 6 ^ A|A;2 = 3tr2(M2) + 6tr(M^ 



Finally, for (35), we have 
E[{Z[MZ, - tr(M))4] = e[(X; A,(^| - 1) 

= E A^EKC/i^ - 1)^] + A,^A^E[([/| - Ifmuf - If 

P P 

= 60 ^ A^ + 24 ^ X]Xf = 48tr(M^) + 12tr2(M2). 

i=i m 

This completes the proof of the lemma. 

In order to understand the variance of T„, we need the following lemma. 
Lemma 4. For Xi, X2, *~ Np{0, S), we have 

Var(/i(Xi, X2)) = 2[tr2(s2) + tr(S4)] + 4tr (^^(S - if) , 
Cov{hiXi,X2),h{Xi,X3)) = 2tr(s2(S - if). 



□ 
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Proof. For the variance, we first decompose it as 

Var(/i(Xi,X2)) = Var(X[X2)2 + 2Var(X;Xi) -4Cov((X;X2)2,(X;Xi)). 



For Var(X[X2)2 = E[{X[X2)'^] - [E{X[X2f]'^, we have from ([34]) that 

E[{X[X2)^] = E\{Z{AZ2f\ = 5tT\A^) + 6tT{A^) = 3t?{^^) + 6tr(S^). 

On the other hand, we have 

E[{X[X2f] = E[Z[AZ2Z'2AZi] = E[tr(AZ2Z2^)] = E[Z'2A^Z2] = tr{A^) = tv{T,'^). 

Thus, we obtain Var(XjX2)^ = 2tr^(i;^) + 6tr(S'^). Similar type of calculation yields that 

Var(X;Xi) = 2tr(s2), Cov((X^X2)^ iX[Xi)) = 2tr(s3). 

Assembling the pieces, we prove the variance formula. 

For the covariance formula, the basic quantity to compute is 

E[{X[X2)' - {X[X,) - {X',X2mX[Xsf - {X[X,) - {X',X,)] 
= E{X[X2)\X[X3f - E{X[Xi){X[X3f - E{X'2X2)EiX[X3f 

- E{X[X2)\X[Xi) + EiX[Xif + E(X;Xi)E(X^X2) 

- E{X'^X3)E[{X[X2f - {X[Xi) - {X'2X2)] 

= E{X[X2)\X[X3f - 2E{X[Xi){X[X3f - 2E(X;Xi)E(X;X2)2 + E(X[Xif + 3[E{X[Xi] 

First, we compute E{X[X2)'^ {X[X3)'^ , for which we have 

E{X[X2f{X[Xsf = E[{Z[AZ2f{Z[AZ;f] 

= E[E[Z'2AZiZ[AZ2\Zi] E[Z'^AZiZ[AZ3\Zi]] 
= E[tv'^{AZiZ[A)] 
= E[{Z[A^Zif] 

= i?{A^) + 2tr(^^) = tr2(s2) + 2tr(S^). 
Next, we compute E(X{Xi)(X{X3)^, for which we have 
E{X[Xi){X[X3f = E[{Z[AZi){Z[AZ3f] 



E[{Z[AZi)E[Z'3AZiZ[AZ3\Zi\] 
E[{Z[AZ^){Z[A^Zi)] 

2tr(^3) + tr(A2)tr(^) = 2tr(S3) + tr(S2)tr(S). 
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We further note that E{X[Xi)'^ = E{Z[AZi)^ = 2tr(S2) +tr2(S), that E{X[X2)'^ = tr(S2), 
and that E{X[Xi) = tr(S). Thus, we obtain that 

E[iX[X2? - {X[X,) - {X',X,mX[Xsf - {X[X,) - (X'.Xs)] 

= tr2(s2) + 2tr(S'') - 4tr(S3) - 4tr(S2)tr(S) + 2tr(S2) + 4tr2(S). 

Noting that E[{X[X2f - {X[Xi) - (X^Xa)] = tr(S2) - 2tr(S), we obtain the claim. □ 
Proof of With Lemma we have 

Var( HXi,Xj))= Var(/i(X„X,)) +2 Cov {h{Xi, Xj), h{X,, , Xj,)) 

i — i^ or j—j' 

= ''^''~^\ Br{h{X,,X2)) +2 ''^''~^\ n-2)Cov{h{X,,X2),h{X,,Xs 
= n{n - 1) [tr2(s2) + tr(S^)] + 2n(n - l)2tr (^^(S - if) . 



Multiplying both sides with 4n ^(n — 1) ^, we obtain Q. 
A. 3 Proof details for Proposition |3] 



□ 



A.3.1 Proof of (12) 



First of all, we give a formal proof of the representation (12). 



Proof of (12). The computations made in [71 Appendix] are handy for the proof here. 
Indeed, we have 



Unk = (Efc — Efc_i 
Vnk = (Efc — Efc_i 

2 



Ix'.X,] = i [x'.Xk - tr(S) 



n{n — 1) 



i^k 



, ^ jXl,Qfe_iXfc -tr(Qfe_iS)] + -tr(s2)], 
n{n — Ij n 



where Qfc_i = l^i=i (^i^i ~ Noting that D^k = v^k — 2unk, we obtain (12) 



(36) 



□ 



A.3.2 Proof of (19) 



To calculate cj^^, we note that 



o"nfc = ^k-i[Dlk] = 4Efc_i[u^fc] - 4Efc_i[n„fcT;„fc] + Ek-iivl^]. 



Thus, (19) is immediate with the following lemma. 
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Lemma 5. For Unk,Vnk defined as in (36), we have 
2 



2/ + ^tr(S3), 

n^(n — 1) 



8 



tr(Q,,_iSQ,,_iS) + 



16 



n2(n — 1) 



tr(Qfc_iS3) + — tr(S4). 



Proof. First, we have 



Next, we have from (33) that 
2 

^k-l[UnkVnk] = TT^fc-l 

?i^(n — Ij L J 

[tr(S)tr(Qfc_iS) + 2ti{Qk-i^^) - tr(S)tr(Qfe_iS) 



1 2 r 1 

Xj^Xk — tr(S) X^Qfc^iXjt H — ^ -^fe-'^fc — trC^) 
J L J 



n2(n — 1) - 
2 



tr(S)tr(S^) + 2tr(S^) - tr(E)tr(S' 
^ tr(Qfc_iS2) + 4tr(S3). 



n2(n — 1) 



Finally, we have 



n2(n — 1)^ 



+ 



-fc-i 



+ 



n2(n — 1) 
4 



-'^fcQfc-l^fc ~ t^(Qfc-l^) ^'kQk-lXk 



2 Efc-i 



Note that 



-fc-i 



X'kQk-iXkX'f^Qk-iXk 



X'kQk-iXkX'jJlXk 



-k-i 



^k^ Qk-i^ZkZjS Qk-iTZk 
tr^T'Qk-iT) + 2tr(r'Qfc_irr'Qfc_ir) 
2tr(Qfc_iSQfc_iS) + tr2(Qfc_iS), 



-fc-i 



Zk^ Qk-i^ZkZfS TTZk 

tr(r'Qfe_ir)tr(r'sr) + 2tr(r'Qfc_irr'sr) 

2tr(Qfc_iS3) + tr(Qfe_iS)tr(S2), 



-k-i 



X'kTiXkX'j^TiXk 



-k-i 



ZirT.TZkZirT.TZk 

2tr(r'srr'sr) + tr2(r'sr) 

2tr(S^)+tr2(s2). 



Collecting terms, we complete the proof. 



□ 
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A.3.3 Proof of Lemma H 

Finally we shall complete the proof of Lemma [2] 
Recall that Mk-i = AY!lll{ZiZ'- - I) A, and so 

k-l k-2 fc-l 

tT{Mi_,) = J2tT{A{Z,Z',- I)A\Z,Zl- I)A) + E tT{A{Z,Zl-I)A\Z,Z'^-I)A). 

i=l i=l j=i+l 

(37) 

For any fixed i, we have 

E[tr(A(Z,Z; - I)A\Z^Z'i - I) A)] = ^{\.v{AZ^Z[A^ Z,Z[A)\ - 2E[tr (A^ZiZlA^)] + tr(A^) 

= E[{ZlA^Zif] - 2E[ZlA^Z,] + tr(^^) 
= 2tr(^^) + tr2(^2) - 2tr(^^) + tr(^^) 
= tr2(s2) +tr(S^). 

On the other hand, for any i ^ j, we have 

E[ti{AiZiZ'i - I)A^{ZjZ'j - I)A)] = 0. 



In addition, we note that the terms in (37) are all uncorrelated. Therefore, we obtain that 

E[tr(Mti)] = (A;-l)[tr2(s2) + tr(S^)]. 

Moreover, we have 

k-l 



Var[tr(M|_i)] = J] Var[tr(A(Z,Z; - I)A\Z,Zl - I) A)] 



i=l 

+ Var[tr(^(Z,Z; - I)A\Z,Z'^ - I)A)] ^^^^ 

1=1 j=i+l 

= ik-l)Vi + 2{k-l){k-2)V2. 

Here, Vi = yar[tT{A{ZiZ[-I)A'^{ZiZ[-I)A)] and V2 = yzr[tT{A{ZiZ[-I)A^{Z2Z'^-I)A)]. 
Consider Vi first, for which we have the decomposition 

Vi = \lar[{Z[A^Zif - 2Z[A^Zi] 

= yar[{Z[A^Zi f] + 4yar{Z[A^Zi) - ACoy[{Z[A^ Zi f , Z[A^Zi]. 

To calculate Var[(Z; A^Zi)^], we note that the eigenvalues of A"^ are Af > • • • > Ap, 
where Ai > • • • > Ap are the eigenvalues of S. Let Uj *~ N{0, 1), for j = 1, . . . ,p. By the 
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moment generating function of x^^^ distribution, we have ^[Uj] = 1, ^[Uj] = 3, ^[Uj] 
and E[Uf] = 105. Then, we obtain that 



3=1 m ■ jy^l 



P P Q P 



12 P 24 ^ 

+ 3 • ^ ^ A^Af + — ^ A^Af A^A^ 

= tr^(S2) + 12tr(S^)tr2(S2) + 12tr2(E^) + 32tr(S6)tr(s2) + 48tr(S«). 

Observing that £[(^(^2 2-^)2] ^ 2tr(E^) + tr2(s2), we get 

yar[{Z[A'^Zif] = 8tr(s4)tr2(S2) + 8tr2(S^) + 32tr(s6)tr(s2) + 48tr(S«). 

Next, we compute \/ar{Z[A'^Zi), for which we have 

E[(ZiA^Zi)2] = 2tr(E^) + tr2(s4), ElZiA'^Zi] = tr(S^). 

Therefore, we get Var(Z;^4^i) = 2tr(SS). 

Now switch to Cov[{Z[A^Ziy, Z[A^Zi]. We note that 

7l A2ry \2r7/ , 



E\{Z{A^ZxYZ{A^Zy\ 
=E[(EAlf// + EA^f^|f^z^)EA.^f/: 



2 

i 

j=\ j+i • j=i 

p p p 



=E[EA.^f^^ + EAX^/^f + 2^A,^Af^/^f+ ^ X]X\XI,UPPI 

=l5±X^ + 3±X^Xt + 6±X^X^+ E AjAfA^ 

=tr(S^)tr2(S2) + 4tr(S6)tr(S2) + 2tr2(S^) + 8tr(S«). 
By previous expression for E[{Z[A^ Zi)'^] and E[Z'^A^Zi], we obtain that 
Cov[{Z[A'^Zi)'^, Z[A^Zi] = 4tr(S6)tr(S2) + 8tr(S«). 
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Finally, we obtain that 

Vi = 8tr(S^)tr2(s2) + 8tr2(S^) + 16tr(S6)tr(S2) + 24tr(ES). (39) 

Switch to the calculation of V2. We first note that 

V2 = Var[ti{A{ZiZ[ - 7)^2(^2^2 - I)A)] 

= yar[ti{AZiZ[A'^ZiZ[A) - ii{A^ZiZ[A^) - iT{A^ Z2Z'^A^)] 
= yar[{Z[A^Z2f] + 2Var(Z;^^Zi) - 2Cov[{Z[A^ Z2f , Z[A*Zi]. 

Note that E.[{Z[A^ Z2)*] = 3tr2(s4) + 6tr(S8), and £[(^(^2^2)^] = tr(S4). We then get 

Var[(Z;A2Z2)2] = 2tr2(s4) +6tr(S8). 

In addition, previous calculation gives yar{Z[A^Zi) = 2tr(SS). Then for Coy[{Z[A^ Z2f , Z'^A^Zi], 
we have 



^[{Z[A^Z,fZ[A'Z,] = ^[{Y.>^]U,V,)'j2>^p^. 



p 



p 



p 



= tr2(S^)+2tr(SS). 
This leads to the conclusion that Cov[{Z[A'^Z2f, Z[A^Zi] = 2tr(SS), and so 

V2 = 2tr2(S^) + 6tr(S^). 



(40) 



Replacing Vi and V2 in ( 38 ) by ( 39 ) and ( 40 ) , we obtain the claimed formula for Va r [tr ( M|_ ^ ) ] . 
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